Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines

نویسندگان

  • Sandeep K. S. Gupta
  • S. D. Kaushik
  • Chua-Huang Huang
  • P. Sadayappan
چکیده

Array statements are often used to express data-parallelism in scientiic languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, eecient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local memory access sequence and communication sets can be eeciently enumerated as closed forms using regular sections. First, closed form solutions are presented for arrays that are distributed using block or cyclic distributions. These closed forms are then used with a virtual processor approach to give an eecient solution for arrays with block-cyclic distributions. This approach is based on viewing a block-cyclic distribution as a block (or cyclic) distribution on a set of virtual processors, which are cyclically (or block-wise) mapped to physical processors. These views are referred to as virtual-block or virtual-cyclic views depending on whether a block or cyclic distribution of the array on the virtual processors is used. The virtual processor approach permits diierent schemes based on the combination of the virtual processor views chosen for the diierent arrays involved in an array statement. These virtualization schemes have diierent indexing overhead. We present a strategy for identifying the virtualization scheme which will have the best performance. Performance results on a Cray T3D system are presented for hand-compiled code for array assignments. These results show that using the virtual processor approach, eecient code can be generated for execution of array statements involving block-cyclically distributed arrays.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiling Array Statements for E cient Execution onDistributed - Memory Machines : Two - level

In languages such as High Performance Fortran (HPF), array statements are used for expressing data parallelism. In compiling array statements for distributed-memory machines, eecient enumeration of local index sets and communication sets is important. The virtual processor approach, among several other methods, has been proposed for eecient enumeration of these index sets. In this paper, using ...

متن کامل

Optimal Evaluation of Fortran-90 Array Expressions for Distributed Memory Machines

The owner-computes strategy has been used for evaluation of Fortran-90 array expressions on distributed memory machines. This strategy simpliies code generation but is often expensive in terms of the total communication cost and size of temporary memory required for its implementation. In this paper, we propose the relaxing of the owner computes strategy, to reduce the total communication and t...

متن کامل

Automatic Array Alignment as a Step in Hierarchical Program Transformation

We present an original approach to automatic array alignment, the step in the hierarchical transformation system aimed at the efficient execution of shared memory programs on distributed memory machines. Our array alignment algorithm deals with a broad set of intra-dimension and inter-dimension alignment preferences, including offsets, strides, permutations, embeddings, and their combinations. ...

متن کامل

Array Operation Synthesis to Optimize HPF Programs

An increasing number of programming languages, such as Fortran 90, HPF, and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. The synthesis of consecutive array operations or array expressions into a composite access function ...

متن کامل

Improved Probabilistic Routing on Generalized Hypercubes

p. 1 Efficient Data Communication in Incomplete Hypercubes p. 13 Efficient Communication in the Folded Petersen Interconnection Networks p. 25 Compiling Rewriting onto SIMD and MIMD/SIMD Machines p. 37 A Compilation Technique for Varying Communication Cost NUMA Architectures p. 49 A Data Partitioning Algorithm for Distributed Memory Compilation p. 61 Towards a High Precision Massively Parallel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 32  شماره 

صفحات  -

تاریخ انتشار 1996